OcrV1, Main, Exploration, bibRecord, 000295

Whole-Book Recognition

Identifieur interne : 000295 ( Main/Exploration ); précédent : 000294; suivant : 000296

Whole-Book Recognition

Auteurs : PINGPING XIU [États-Unis] ; Henry S. Baird [États-Unis]

Source :

IEEE transactions on pattern analysis and machine intelligence [ 0162-8828 ] ; 2012.

RBID : Pascal:13-0020107

Descripteurs français

Pascal (Inist)
- Analyse documentaire, Analyse image, Traitement image, Reconnaissance caractère, Reconnaissance optique caractère, Dictionnaire, Reconnaissance forme, Mot, Langage naturel, Reconnaissance image, Linguistique, Interface utilisateur, Livre, Modèle linguistique, Défaut, Loi a posteriori, Probabilité a posteriori, Modélisation, Approche probabiliste, Entropie, Loi probabilité, Taux erreur, Approximation asymptotique, Echantillon remanié, Algorithme anytime, Méthode adaptative, Politique, ., Classification image, Apprentissage non supervisé.
Wicri :
- topic : Dictionnaire, Linguistique, Politique.

English descriptors

KwdEn :
- Adaptive method, Anytime algorithm, Asymptotic approximation, Book, Character recognition, Defect, Dictionaries, Document analysis, Entropy, Error rate, Image analysis, Image classification, Image processing, Image recognition, Linguistic model, Linguistics, Modeling, Natural language, Optical character recognition, Pattern recognition, Policy, Posterior distribution, Posterior probability, Probabilistic approach, Probability distribution, Remolded sample, Unsupervised learning, User interface, Word.

Abstract

Whole-book recognition is a document image analysis strategy that operates on the complete set of a book's page images using automatic adaptation to improve accuracy. We describe an algorithm which expects to be initialized with approximate iconic and linguistic models-derived from (generally errorful) OCR results and (generally imperfect) dictionaries-and then, guided entirely by evidence internal to the test set, corrects the models which, in turn, yields higher recognition accuracy. The iconic model describes image formation and determines the behavior of a character-image classifier, and the linguistic model describes word-occurrence probabilities. Our algorithm detects "disagreements" between these two models by measuring cross entropy between 1) the posterior probability distribution of character classes (the recognition results resulting from image classification alone) and 2) the posterior probability distribution of word classes (the recognition results from image classification combined with linguistic constraints). We show how disagreements can identify candidates for model corrections at both the character and word levels. Some model corrections will reduce the error rate over the whole book, and these can be identified by comparing model disagreements, summed across the whole book, before and after the correction is applied. Experiments on passages up to 180 pages long show that when a candidate model adaptation reduces whole-book disagreement, it is also likely to correct recognition errors. Also, the longer the passage operated on by the algorithm, the more reliable this adaptation policy becomes, and the lower the error rate achieved. The best results occur when both the iconic and linguistic models mutually correct one another. We have observed recognition error rates driven down by nearly an order of magnitude fully automatically without supervision (or indeed without any user intervention or interaction). Improvement is nearly monotonic, and asymptotic accuracy is stable, even over long runs. If implemented naively, the algorithm runs in time quadratic in the length of the book, but random subsampling and caching techniques speed it up by two orders of magnitude with negligible loss of accuracy. Whole-book recognition has potential applications in digital libraries as a safe unsupervised anytime algorithm.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000069
to stream PascalFrancis, to step Curation: 000699
to stream PascalFrancis, to step Checkpoint: 000057
to stream Main, to step Merge: 000298
to stream Main, to step Curation: 000295

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Whole-Book Recognition</title>
<author><name sortKey="Pingping Xiu" sort="Pingping Xiu" uniqKey="Pingping Xiu" last="Pingping Xiu">PINGPING XIU</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Microsoft Advertising R&D, One Microsoft Way</s1>
<s2>Redmond, WA 98052-6399</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Baird, Henry S" sort="Baird, Henry S" uniqKey="Baird H" first="Henry S." last="Baird">Henry S. Baird</name>
<affiliation wicri:level="2"><inist:fA14 i1="02"><s1>Department of Computer Science and Engineering, Lehigh University, 19 Memorial Drive West</s1>
<s2>Bethlehem, PA 18015</s2>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">13-0020107</idno>
<date when="2012">2012</date>
<idno type="stanalyst">PASCAL 13-0020107 INIST</idno>
<idno type="RBID">Pascal:13-0020107</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000069</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000699</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000057</idno>
<idno type="wicri:doubleKey">0162-8828:2012:Pingping Xiu:whole:book:recognition</idno>
<idno type="wicri:Area/Main/Merge">000298</idno>
<idno type="wicri:Area/Main/Curation">000295</idno>
<idno type="wicri:Area/Main/Exploration">000295</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Whole-Book Recognition</title>
<author><name sortKey="Pingping Xiu" sort="Pingping Xiu" uniqKey="Pingping Xiu" last="Pingping Xiu">PINGPING XIU</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Microsoft Advertising R&D, One Microsoft Way</s1>
<s2>Redmond, WA 98052-6399</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Baird, Henry S" sort="Baird, Henry S" uniqKey="Baird H" first="Henry S." last="Baird">Henry S. Baird</name>
<affiliation wicri:level="2"><inist:fA14 i1="02"><s1>Department of Computer Science and Engineering, Lehigh University, 19 Memorial Drive West</s1>
<s2>Bethlehem, PA 18015</s2>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">IEEE transactions on pattern analysis and machine intelligence</title>
<title level="j" type="abbreviated">IEEE trans. pattern anal. mach. intell.</title>
<idno type="ISSN">0162-8828</idno>
<imprint><date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">IEEE transactions on pattern analysis and machine intelligence</title>
<title level="j" type="abbreviated">IEEE trans. pattern anal. mach. intell.</title>
<idno type="ISSN">0162-8828</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Adaptive method</term>
<term>Anytime algorithm</term>
<term>Asymptotic approximation</term>
<term>Book</term>
<term>Character recognition</term>
<term>Defect</term>
<term>Dictionaries</term>
<term>Document analysis</term>
<term>Entropy</term>
<term>Error rate</term>
<term>Image analysis</term>
<term>Image classification</term>
<term>Image processing</term>
<term>Image recognition</term>
<term>Linguistic model</term>
<term>Linguistics</term>
<term>Modeling</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Policy</term>
<term>Posterior distribution</term>
<term>Posterior probability</term>
<term>Probabilistic approach</term>
<term>Probability distribution</term>
<term>Remolded sample</term>
<term>Unsupervised learning</term>
<term>User interface</term>
<term>Word</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Analyse documentaire</term>
<term>Analyse image</term>
<term>Traitement image</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Dictionnaire</term>
<term>Reconnaissance forme</term>
<term>Mot</term>
<term>Langage naturel</term>
<term>Reconnaissance image</term>
<term>Linguistique</term>
<term>Interface utilisateur</term>
<term>Livre</term>
<term>Modèle linguistique</term>
<term>Défaut</term>
<term>Loi a posteriori</term>
<term>Probabilité a posteriori</term>
<term>Modélisation</term>
<term>Approche probabiliste</term>
<term>Entropie</term>
<term>Loi probabilité</term>
<term>Taux erreur</term>
<term>Approximation asymptotique</term>
<term>Echantillon remanié</term>
<term>Algorithme anytime</term>
<term>Méthode adaptative</term>
<term>Politique</term>
<term>.</term>
<term>Classification image</term>
<term>Apprentissage non supervisé</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Dictionnaire</term>
<term>Linguistique</term>
<term>Politique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Whole-book recognition is a document image analysis strategy that operates on the complete set of a book's page images using automatic adaptation to improve accuracy. We describe an algorithm which expects to be initialized with approximate iconic and linguistic models-derived from (generally errorful) OCR results and (generally imperfect) dictionaries-and then, guided entirely by evidence internal to the test set, corrects the models which, in turn, yields higher recognition accuracy. The iconic model describes image formation and determines the behavior of a character-image classifier, and the linguistic model describes word-occurrence probabilities. Our algorithm detects "disagreements" between these two models by measuring cross entropy between 1) the posterior probability distribution of character classes (the recognition results resulting from image classification alone) and 2) the posterior probability distribution of word classes (the recognition results from image classification combined with linguistic constraints). We show how disagreements can identify candidates for model corrections at both the character and word levels. Some model corrections will reduce the error rate over the whole book, and these can be identified by comparing model disagreements, summed across the whole book, before and after the correction is applied. Experiments on passages up to 180 pages long show that when a candidate model adaptation reduces whole-book disagreement, it is also likely to correct recognition errors. Also, the longer the passage operated on by the algorithm, the more reliable this adaptation policy becomes, and the lower the error rate achieved. The best results occur when both the iconic and linguistic models mutually correct one another. We have observed recognition error rates driven down by nearly an order of magnitude fully automatically without supervision (or indeed without any user intervention or interaction). Improvement is nearly monotonic, and asymptotic accuracy is stable, even over long runs. If implemented naively, the algorithm runs in time quadratic in the length of the book, but random subsampling and caching techniques speed it up by two orders of magnitude with negligible loss of accuracy. Whole-book recognition has potential applications in digital libraries as a safe unsupervised anytime algorithm.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Pennsylvanie</li>
<li>Washington (État)</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Washington (État)"><name sortKey="Pingping Xiu" sort="Pingping Xiu" uniqKey="Pingping Xiu" last="Pingping Xiu">PINGPING XIU</name>
</region>
<name sortKey="Baird, Henry S" sort="Baird, Henry S" uniqKey="Baird H" first="Henry S." last="Baird">Henry S. Baird</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000295 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000295 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:13-0020107
   |texte=   Whole-Book Recognition
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Whole-Book Recognition

Whole-Book Recognition

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri